teg AWS API Gateway Terraform深度实战

葫芦的运维日志

浏览量 17 2026/02/25 14:58

AWS API Gateway Terraform 深度实战

API Gateway 是 AWS 无服务器架构的核心组件。用控制台点点点能跑起来,但到了生产环境,Infrastructure as Code 才是正道。这篇文章从基础到高级,系统讲解如何用 Terraform 管理 API Gateway,踩坑经验全部包含。

一、REST API vs HTTP API:先选对类型

AWS 有两种 API Gateway,选错了后面全白搭:

  • REST API (v1):功能全,支持 WAF、请求验证、Usage Plan、API Key、请求/响应转换。适合对外开放的正式 API。
  • HTTP API (v2):便宜 70%,延迟低,支持 JWT 原生授权、自动部署。适合内部微服务通信或简单代理。

一句话:对外用 REST API,对内用 HTTP API。下面两种都会讲。

二、REST API 完整 Terraform 配置

先来一个生产级的 REST API,集成 Lambda 后端:

# ============================================
# REST API 主体
# ============================================
resource "aws_api_gateway_rest_api" "main" {
  name        = "${var.project}-api"
  description = "Production REST API"

  endpoint_configuration {
    types = ["REGIONAL"]  # REGIONAL / EDGE / PRIVATE
  }

  # 重要:控制 body 变更时的行为
  put_rest_api_mode = "overwrite"

  tags = var.common_tags
}

# ============================================
# 资源路径: /users/{userId}
# ============================================
resource "aws_api_gateway_resource" "users" {
  rest_api_id = aws_api_gateway_rest_api.main.id
  parent_id   = aws_api_gateway_rest_api.main.root_resource_id
  path_part   = "users"
}

resource "aws_api_gateway_resource" "user_by_id" {
  rest_api_id = aws_api_gateway_rest_api.main.id
  parent_id   = aws_api_gateway_resource.users.id
  path_part   = "{userId}"
}

# ============================================
# GET /users/{userId}
# ============================================
resource "aws_api_gateway_method" "get_user" {
  rest_api_id   = aws_api_gateway_rest_api.main.id
  resource_id   = aws_api_gateway_resource.user_by_id.id
  http_method   = "GET"
  authorization = "COGNITO_USER_POOLS"
  authorizer_id = aws_api_gateway_authorizer.cognito.id

  request_parameters = {
    "method.request.path.userId"          = true
    "method.request.header.Authorization" = true
  }

  # 请求验证器
  request_validator_id = aws_api_gateway_request_validator.params.id
}

# Lambda 集成
resource "aws_api_gateway_integration" "get_user" {
  rest_api_id             = aws_api_gateway_rest_api.main.id
  resource_id             = aws_api_gateway_resource.user_by_id.id
  http_method             = aws_api_gateway_method.get_user.http_method
  integration_http_method = "POST"  # Lambda 永远用 POST
  type                    = "AWS_PROXY"
  uri                     = aws_lambda_function.get_user.invoke_arn

  # 超时设置(最大 29 秒)
  timeout_milliseconds = 29000
}

# Lambda 权限
resource "aws_lambda_permission" "apigw_get_user" {
  statement_id  = "AllowAPIGatewayInvoke"
  action        = "lambda:InvokeFunction"
  function_name = aws_lambda_function.get_user.function_name
  principal     = "apigateway.amazonaws.com"
  source_arn    = "${aws_api_gateway_rest_api.main.execution_arn}/*/*"
}

三、请求验证 — 把垃圾请求挡在 Lambda 之前

很多人忽略这个功能,让无效请求打到 Lambda 白白花钱。Terraform 配置:

# 请求验证器
resource "aws_api_gateway_request_validator" "params" {
  name                        = "validate-params"
  rest_api_id                 = aws_api_gateway_rest_api.main.id
  validate_request_parameters = true
  validate_request_body       = true
}

# 请求 Model(JSON Schema 验证请求体)
resource "aws_api_gateway_model" "create_user" {
  rest_api_id  = aws_api_gateway_rest_api.main.id
  name         = "CreateUserRequest"
  content_type = "application/json"

  schema = jsonencode({
    "$schema" = "http://json-schema.org/draft-04/schema#"
    type      = "object"
    required  = ["name", "email"]
    properties = {
      name = {
        type      = "string"
        minLength = 1
        maxLength = 100
      }
      email = {
        type    = "string"
        pattern = "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"
      }
      age = {
        type    = "integer"
        minimum = 0
        maximum = 150
      }
    }
  })
}

# POST /users 使用 Model 验证
resource "aws_api_gateway_method" "create_user" {
  rest_api_id          = aws_api_gateway_rest_api.main.id
  resource_id          = aws_api_gateway_resource.users.id
  http_method          = "POST"
  authorization        = "COGNITO_USER_POOLS"
  authorizer_id        = aws_api_gateway_authorizer.cognito.id
  request_validator_id = aws_api_gateway_request_validator.params.id

  request_models = {
    "application/json" = aws_api_gateway_model.create_user.name
  }
}

这样,name 为空、email 格式不对的请求,直接被 API Gateway 返回 400,Lambda 根本不会被调用。

四、Cognito 授权器

resource "aws_api_gateway_authorizer" "cognito" {
  name            = "cognito-authorizer"
  rest_api_id     = aws_api_gateway_rest_api.main.id
  type            = "COGNITO_USER_POOLS"
  identity_source = "method.request.header.Authorization"

  provider_arns = [aws_cognito_user_pool.main.arn]

  # Token 缓存时间(秒),减少 Cognito 调用
  authorizer_result_ttl_in_seconds = 300
}

# 如果需要更灵活的授权逻辑,用 Lambda Authorizer
resource "aws_api_gateway_authorizer" "lambda_auth" {
  name                   = "lambda-authorizer"
  rest_api_id            = aws_api_gateway_rest_api.main.id
  type                   = "TOKEN"
  authorizer_uri         = aws_lambda_function.authorizer.invoke_arn
  authorizer_credentials = aws_iam_role.apigw_auth_invocation.arn
  identity_source        = "method.request.header.Authorization"

  # 缓存策略:相同 token 5分钟内不重复调用 Lambda
  authorizer_result_ttl_in_seconds = 300
}

五、部署与阶段管理 — 最容易踩坑的地方

API Gateway 的部署模型是很多人困惑的地方。改了配置不生效?多半是部署没触发。

# 部署 — 关键是 triggers
resource "aws_api_gateway_deployment" "main" {
  rest_api_id = aws_api_gateway_rest_api.main.id

  # 核心技巧:用所有相关资源的变化来触发重新部署
  triggers = {
    redeployment = sha1(jsonencode([
      aws_api_gateway_resource.users.id,
      aws_api_gateway_resource.user_by_id.id,
      aws_api_gateway_method.get_user.id,
      aws_api_gateway_method.create_user.id,
      aws_api_gateway_integration.get_user.id,
      aws_api_gateway_integration.create_user.id,
    ]))
  }

  lifecycle {
    create_before_destroy = true
  }
}

# Stage
resource "aws_api_gateway_stage" "prod" {
  deployment_id = aws_api_gateway_deployment.main.id
  rest_api_id   = aws_api_gateway_rest_api.main.id
  stage_name    = "prod"

  # 访问日志
  access_log_settings {
    destination_arn = aws_cloudwatch_log_group.apigw.arn
    format = jsonencode({
      requestId      = "$context.requestId"
      ip             = "$context.identity.sourceIp"
      caller         = "$context.identity.caller"
      user           = "$context.identity.user"
      requestTime    = "$context.requestTime"
      httpMethod     = "$context.httpMethod"
      resourcePath   = "$context.resourcePath"
      status         = "$context.status"
      protocol       = "$context.protocol"
      responseLength = "$context.responseLength"
      errorMessage   = "$context.error.message"
      integrationLatency = "$context.integration.latency"
    })
  }

  # Stage 变量(可在集成中引用)
  variables = {
    env            = "prod"
    lambda_alias   = "live"
  }

  tags = var.common_tags
}

# 方法级别的设置(限流、缓存)
resource "aws_api_gateway_method_settings" "prod" {
  rest_api_id = aws_api_gateway_rest_api.main.id
  stage_name  = aws_api_gateway_stage.prod.stage_name
  method_path = "*/*"

  settings {
    # 限流
    throttling_burst_limit = 500
    throttling_rate_limit  = 1000

    # 日志级别
    logging_level   = "INFO"
    metrics_enabled = true

    # 缓存(按需开启,有额外费用)
    caching_enabled      = false
    cache_ttl_in_seconds = 300
  }
}

踩坑提醒:如果你改了 method 或 integration 但没更新 triggers 里的引用,deployment 不会重新创建,改动就不会生效。这是 Terraform 管理 API Gateway 最常见的坑。

六、自定义域名 + Route53

# ACM 证书(必须在 us-east-1,如果用 EDGE 类型)
resource "aws_acm_certificate" "api" {
  provider          = aws.us_east_1  # EDGE 类型需要
  domain_name       = "api.example.com"
  validation_method = "DNS"

  lifecycle {
    create_before_destroy = true
  }
}

# 自定义域名
resource "aws_api_gateway_domain_name" "api" {
  domain_name              = "api.example.com"
  regional_certificate_arn = aws_acm_certificate.api.arn

  endpoint_configuration {
    types = ["REGIONAL"]
  }
}

# 路径映射
resource "aws_api_gateway_base_path_mapping" "api" {
  api_id      = aws_api_gateway_rest_api.main.id
  stage_name  = aws_api_gateway_stage.prod.stage_name
  domain_name = aws_api_gateway_domain_name.api.domain_name
  base_path   = ""  # 空字符串 = 根路径
}

# Route53 记录
resource "aws_route53_record" "api" {
  zone_id = data.aws_route53_zone.main.zone_id
  name    = "api.example.com"
  type    = "A"

  alias {
    name                   = aws_api_gateway_domain_name.api.regional_domain_name
    zone_id                = aws_api_gateway_domain_name.api.regional_zone_id
    evaluate_target_health = true
  }
}

七、WAF 集成 — 生产环境必备

resource "aws_wafv2_web_acl" "api" {
  name  = "${var.project}-api-waf"
  scope = "REGIONAL"

  default_action {
    allow {}
  }

  # 速率限制:同一 IP 5分钟内最多 2000 次请求
  rule {
    name     = "rate-limit"
    priority = 1

    action {
      block {}
    }

    statement {
      rate_based_statement {
        limit              = 2000
        aggregate_key_type = "IP"
      }
    }

    visibility_config {
      sampled_requests_enabled   = true
      cloudwatch_metrics_enabled = true
      metric_name                = "RateLimit"
    }
  }

  # AWS 托管规则 — 防 SQL 注入
  rule {
    name     = "aws-managed-sql"
    priority = 2

    override_action {
      none {}
    }

    statement {
      managed_rule_group_statement {
        name        = "AWSManagedRulesSQLiRuleSet"
        vendor_name = "AWS"
      }
    }

    visibility_config {
      sampled_requests_enabled   = true
      cloudwatch_metrics_enabled = true
      metric_name                = "SQLInjection"
    }
  }

  # AWS 托管规则 — 防常见攻击
  rule {
    name     = "aws-managed-common"
    priority = 3

    override_action {
      none {}
    }

    statement {
      managed_rule_group_statement {
        name        = "AWSManagedRulesCommonRuleSet"
        vendor_name = "AWS"

        # 排除误杀的规则
        excluded_rule {
          name = "SizeRestrictions_BODY"
        }
      }
    }

    visibility_config {
      sampled_requests_enabled   = true
      cloudwatch_metrics_enabled = true
      metric_name                = "CommonRules"
    }
  }

  visibility_config {
    sampled_requests_enabled   = true
    cloudwatch_metrics_enabled = true
    metric_name                = "APIGatewayWAF"
  }
}

# 关联 WAF 到 API Gateway Stage
resource "aws_wafv2_web_acl_association" "api" {
  resource_arn = aws_api_gateway_stage.prod.arn
  web_acl_arn  = aws_wafv2_web_acl.api.arn
}

八、HTTP API (v2) — 轻量级方案

如果不需要 WAF、请求验证这些重型功能,HTTP API 更简洁也更便宜:

resource "aws_apigatewayv2_api" "http" {
  name          = "${var.project}-http-api"
  protocol_type = "HTTP"

  cors_configuration {
    allow_origins = ["https://www.example.com"]
    allow_methods = ["GET", "POST", "PUT", "DELETE", "OPTIONS"]
    allow_headers = ["Content-Type", "Authorization"]
    max_age       = 3600
  }
}

# JWT 授权器(HTTP API 原生支持,不需要 Lambda)
resource "aws_apigatewayv2_authorizer" "jwt" {
  api_id           = aws_apigatewayv2_api.http.id
  authorizer_type  = "JWT"
  identity_sources = ["$request.header.Authorization"]
  name             = "jwt-authorizer"

  jwt_configuration {
    audience = [aws_cognito_user_pool_client.main.id]
    issuer   = "https://cognito-idp.${var.region}.amazonaws.com/${aws_cognito_user_pool.main.id}"
  }
}

# Lambda 集成
resource "aws_apigatewayv2_integration" "lambda" {
  api_id                 = aws_apigatewayv2_api.http.id
  integration_type       = "AWS_PROXY"
  integration_uri        = aws_lambda_function.handler.invoke_arn
  payload_format_version = "2.0"  # 用 2.0 格式,更简洁
}

# 路由
resource "aws_apigatewayv2_route" "get_users" {
  api_id             = aws_apigatewayv2_api.http.id
  route_key          = "GET /users/{userId}"
  target             = "integrations/${aws_apigatewayv2_integration.lambda.id}"
  authorization_type = "JWT"
  authorizer_id      = aws_apigatewayv2_authorizer.jwt.id
}

# 自动部署的 Stage
resource "aws_apigatewayv2_stage" "prod" {
  api_id      = aws_apigatewayv2_api.http.id
  name        = "prod"
  auto_deploy = true  # 路由变更自动部署,不用手动管理 deployment

  access_log_settings {
    destination_arn = aws_cloudwatch_log_group.http_api.arn
    format = jsonencode({
      requestId = "$context.requestId"
      ip        = "$context.identity.sourceIp"
      method    = "$context.httpMethod"
      path      = "$context.path"
      status    = "$context.status"
      latency   = "$context.responseLatency"
    })
  }

  default_route_settings {
    throttling_burst_limit = 500
    throttling_rate_limit  = 1000
  }
}

九、Usage Plan + API Key — 对外开放 API 的计量

如果你的 API 要给第三方用,需要控制调用量和计费:

resource "aws_api_gateway_usage_plan" "partner" {
  name = "partner-plan"

  api_stages {
    api_id = aws_api_gateway_rest_api.main.id
    stage  = aws_api_gateway_stage.prod.stage_name
  }

  # 限流
  throttle_settings {
    burst_limit = 100
    rate_limit  = 50
  }

  # 配额
  quota_settings {
    limit  = 10000
    period = "MONTH"
  }
}

# API Key
resource "aws_api_gateway_api_key" "partner_a" {
  name    = "partner-a-key"
  enabled = true
}

# 关联
resource "aws_api_gateway_usage_plan_key" "partner_a" {
  key_id        = aws_api_gateway_api_key.partner_a.id
  key_type      = "API_KEY"
  usage_plan_id = aws_api_gateway_usage_plan.partner.id
}

# 方法上启用 API Key 要求
resource "aws_api_gateway_method" "get_data" {
  rest_api_id      = aws_api_gateway_rest_api.main.id
  resource_id      = aws_api_gateway_resource.data.id
  http_method      = "GET"
  authorization    = "NONE"
  api_key_required = true  # 关键:要求携带 x-api-key
}

十、模块化 — 大型项目的组织方式

当 API 路由超过 20 个,单文件就不可维护了。推荐这样组织:

terraform/
├── modules/
│   └── api-route/
│       ├── main.tf      # resource + method + integration
│       ├── variables.tf
│       └── outputs.tf
├── api-gateway.tf       # REST API 主体、部署、Stage
├── api-routes.tf        # 调用 module 定义所有路由
├── api-authorizer.tf    # 授权器
├── api-domain.tf        # 自定义域名
├── api-waf.tf           # WAF 规则
└── variables.tf

路由模块:

# modules/api-route/main.tf
variable "rest_api_id" {}
variable "parent_id" {}
variable "path_part" {}
variable "http_method" {}
variable "lambda_invoke_arn" {}
variable "authorizer_id" { default = null }
variable "authorization" { default = "NONE" }

resource "aws_api_gateway_resource" "this" {
  rest_api_id = var.rest_api_id
  parent_id   = var.parent_id
  path_part   = var.path_part
}

resource "aws_api_gateway_method" "this" {
  rest_api_id   = var.rest_api_id
  resource_id   = aws_api_gateway_resource.this.id
  http_method   = var.http_method
  authorization = var.authorization
  authorizer_id = var.authorizer_id
}

resource "aws_api_gateway_integration" "this" {
  rest_api_id             = var.rest_api_id
  resource_id             = aws_api_gateway_resource.this.id
  http_method             = aws_api_gateway_method.this.http_method
  integration_http_method = "POST"
  type                    = "AWS_PROXY"
  uri                     = var.lambda_invoke_arn
}

output "resource_id" {
  value = aws_api_gateway_resource.this.id
}

output "method_id" {
  value = aws_api_gateway_method.this.id
}

output "integration_id" {
  value = aws_api_gateway_integration.this.id
}

调用方式:

# api-routes.tf
module "route_get_users" {
  source            = "./modules/api-route"
  rest_api_id       = aws_api_gateway_rest_api.main.id
  parent_id         = aws_api_gateway_rest_api.main.root_resource_id
  path_part         = "users"
  http_method       = "GET"
  lambda_invoke_arn = module.lambda_get_users.invoke_arn
  authorization     = "COGNITO_USER_POOLS"
  authorizer_id     = aws_api_gateway_authorizer.cognito.id
}

module "route_create_order" {
  source            = "./modules/api-route"
  rest_api_id       = aws_api_gateway_rest_api.main.id
  parent_id         = aws_api_gateway_rest_api.main.root_resource_id
  path_part         = "orders"
  http_method       = "POST"
  lambda_invoke_arn = module.lambda_create_order.invoke_arn
  authorization     = "COGNITO_USER_POOLS"
  authorizer_id     = aws_api_gateway_authorizer.cognito.id
}

十一、常见踩坑总结

  1. deployment 不更新:triggers 里没包含变更的资源,导致 Terraform 认为不需要重新部署。把所有 method、integration 的 id 都放进 triggers。
  2. Lambda 权限 403:忘了加 aws_lambda_permission,API Gateway 没权限调用 Lambda。
  3. CORS 不生效:REST API 需要手动配置 OPTIONS 方法和 Mock 集成返回 CORS 头;HTTP API 用 cors_configuration 就行。
  4. Stage 变量引用:在集成 URI 中用 ${stageVariables.xxx} 引用,但 Terraform 会把它当变量插值。解决:用 $${stageVariables.xxx} 转义。
  5. Binary Media Types:REST API 默认不支持二进制。需要在 aws_api_gateway_rest_api 里设置 binary_media_types = ["*/*"],同时 Lambda 返回 isBase64Encoded: true。
  6. 29秒超时限制:API Gateway 最大超时 29 秒,无法修改。长任务请用异步模式:API Gateway → Lambda (异步调用) → 客户端轮询结果。
  7. Payload 大小限制:REST API 请求体最大 10MB,响应体最大 10MB。超过的考虑用 S3 presigned URL。
  8. terraform destroy 顺序问题:WAF association 必须在 Stage 之前销毁,否则会卡住。用 depends_on 显式声明依赖。

十二、监控告警配置

# CloudWatch 日志组
resource "aws_cloudwatch_log_group" "apigw" {
  name              = "/aws/apigateway/${var.project}"
  retention_in_days = 30
}

# 5xx 错误告警
resource "aws_cloudwatch_metric_alarm" "api_5xx" {
  alarm_name          = "${var.project}-api-5xx"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "5XXError"
  namespace           = "AWS/ApiGateway"
  period              = 300
  statistic           = "Sum"
  threshold           = 10
  alarm_description   = "API Gateway 5xx errors exceeded threshold"

  dimensions = {
    ApiName = aws_api_gateway_rest_api.main.name
    Stage   = aws_api_gateway_stage.prod.stage_name
  }

  alarm_actions = [aws_sns_topic.alerts.arn]
}

# 延迟告警
resource "aws_cloudwatch_metric_alarm" "api_latency" {
  alarm_name          = "${var.project}-api-latency"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 3
  metric_name         = "Latency"
  namespace           = "AWS/ApiGateway"
  period              = 300
  extended_statistic  = "p99"
  threshold           = 5000  # 5秒
  alarm_description   = "API Gateway p99 latency exceeded 5s"

  dimensions = {
    ApiName = aws_api_gateway_rest_api.main.name
    Stage   = aws_api_gateway_stage.prod.stage_name
  }

  alarm_actions = [aws_sns_topic.alerts.arn]
}

以上就是用 Terraform 管理 AWS API Gateway 的完整实战。从选型到部署、从安全到监控,覆盖了生产环境需要考虑的各个方面。核心原则:REST API 功能全但配置繁琐,HTTP API 简洁但功能有限;deployment 的 triggers 一定要覆盖全;WAF 和监控是生产环境的标配。

葫芦的运维日志

打赏

留言板

留言提交后需管理员审核通过才会显示

© 冰糖葫芦甜(bthlt.com) 2025 王梓打赏联系方式陕ICP备17005322号-1